dna storage
Could We Store Our Data in DNA?
A zettabyte is a trillion gigabytes. That's a lot--but, according to one estimate, humanity will produce a hundred and eighty zettabytes of digital data this year. It all adds up: PowerPoints and selfies; video captured by cameras; electronic health records; data retrieved from smart devices or collected by telescopes and particle accelerators; backups, and backups of the backups. Where should it all go, and how much of it should be kept, and for how long? These questions vex the computer scientists who manage the world's storage. For them, the cloud isn't nebulous but a physical system that must be built, paid for, and maintained.
An easier-to-use technique for storing data in DNA is inspired by our cells
The new method, published in Nature last week, is more efficient, storing 350 bits at a time by encoding strands in parallel. Peking University's Long Qian and team got the idea for such templates from the way cells share the same basic set of genes but behave differently in response to chemical changes in DNA strands. "Every cell in our bodies has the same genome sequence, but genetic programming comes from modifications to DNA. If life can do this, we can do this," she says. Once the bricks are locked into their assigned spots on the strand, researchers select which bricks to methylate, with the presence or absence of the modification standing in for binary values of 0 or 1.
SemAI: Semantic Artificial Intelligence-enhanced DNA storage for Internet-of-Things
Wu, Wenfeng, Xiang, Luping, Liu, Qiang, Yang, Kun
In the wake of the swift evolution of technologies such as the Internet of Things (IoT), the global data landscape undergoes an exponential surge, propelling DNA storage into the spotlight as a prospective medium for contemporary cloud storage applications. This paper introduces a Semantic Artificial Intelligence-enhanced DNA storage (SemAI-DNA) paradigm, distinguishing itself from prevalent deep learning-based methodologies through two key modifications: 1) embedding a semantic extraction module at the encoding terminus, facilitating the meticulous encoding and storage of nuanced semantic information; 2) conceiving a forethoughtful multi-reads filtering model at the decoding terminus, leveraging the inherent multi-copy propensity of DNA molecules to bolster system fault tolerance, coupled with a strategically optimized decoder's architectural framework. Numerical results demonstrate the SemAI-DNA's efficacy, attaining 2.61 dB Peak Signal-to-Noise Ratio (PSNR) gain and 0.13 improvement in Structural Similarity Index (SSIM) over conventional deep learning-based approaches.
- Asia > China > Sichuan Province > Chengdu (0.04)
- Europe > United Kingdom > England > Essex > Colchester (0.04)
Learning Structurally Stabilized Representations for Multi-modal Lossless DNA Storage
Cao, Ben, He, Tiantian, Li, Xue, Wang, Bin, Wu, Xiaohu, Zhang, Qiang, Ong, Yew-Soon
In this paper, we present Reed-Solomon coded single-stranded representation learning (RSRL), a novel end-to-end model for learning representations for multi-modal lossless DNA storage. In contrast to existing learning-based methods, the proposed RSRL is inspired by both error-correction codec and structural biology. Specifically, RSRL first learns the representations for the subsequent storage from the binary data transformed by the Reed-Solomon codec. Then, the representations are masked by an RS-code-informed mask to focus on correcting the burst errors occurring in the learning process. With the decoded representations with error corrections, a novel biologically stabilized loss is formulated to regularize the data representations to possess stable single-stranded structures. By incorporating these novel strategies, the proposed RSRL can learn highly durable, dense, and lossless representations for the subsequent storage tasks into DNA sequences. The proposed RSRL has been compared with a number of strong baselines in real-world tasks of multi-modal data storage. The experimental results obtained demonstrate that RSRL can store diverse types of data with much higher information density and durability but much lower error rates.
DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage
Guo, Alan J. X., Sun, Sihan, Wei, Xiang, Wei, Mengyi, Chen, Xin
Recently, DNA storage has emerged as a promising data storage solution, offering significant advantages in storage density, maintenance cost efficiency, and parallel replication capability. Mathematically, the DNA storage pipeline can be viewed as an insertion, deletion, and substitution (IDS) channel. Because of the mathematical terra incognita of the Levenshtein distance, designing an IDS-correcting code is still a challenge. In this paper, we propose an innovative approach that utilizes deep Levenshtein distance embedding to bypass these mathematical challenges. By representing the Levenshtein distance between two sequences as a conventional distance between their corresponding embedding vectors, the inherent structural property of Levenshtein distance is revealed in the friendly embedding space. Leveraging this embedding space, we introduce the DoDo-Code, an IDS-correcting code that incorporates deep embedding of Levenshtein distance, deep embedding-based codeword search, and deep embedding-based segment correcting. To address the requirements of DNA storage, we also present a preliminary algorithm for long sequence decoding. As far as we know, the DoDo-Code is the first IDS-correcting code designed using plausible deep learning methodologies, potentially paving the way for a new direction in error-correcting code research. It is also the first IDS code that exhibits characteristics of being `optimal' in terms of redundancy, significantly outperforming the mainstream IDS-correcting codes of the Varshamov-Tenengolts code family in code rate.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
Gartner: 10 tech trends you need to know for 2023
IT executives must look beyond cost savings to new forms of operational excellence and seek technologies that can help them optimize resilience, scale industry-specific solutions and product delivery, and pioneer new forms of engagement, according to the 10 top strategic technology trends for 2023 unveiled at Gartner's IT Symposium/Xpo 2022. These include multiple forms of wireless, artificial intelligence, and sustainability, according to Frances Karamouzis, distinguished vice president and analyst at Gartner, and external events are making IT pros' decisions about them even more difficult. "Depending on what region of the world you are in there are lots of looming issues such as a potential recession, supply chain concerns, the war in Ukraine and that impact, as well as energy-related issues," Karamouzis said. IT executives must focus on continuing to accelerate digital transformation and consider possible use both for technologies that can be applied immediately and those that are on the horizon. With that as background, Gartner's top 10 strategic technology trends for 2023 looks like this: No single wireless technology will dominate, but enterprises will use a variety of wireless solutions to support a range of environments, from Wi-Fi in the office, services for mobile devices, low-power protocols, and even radio connectivity, Gartner stated.
- Europe > Ukraine (0.25)
- Europe > France (0.25)
- North America > United States (0.05)
- (2 more...)
- Information Technology > Services (0.71)
- Information Technology > Security & Privacy (0.70)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Networks (0.70)
Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning
Bar-Lev, Daniella, Orr, Itai, Sabary, Omer, Etzion, Tuvi, Yaakobi, Eitan
The concept of DNA storage was first suggested in 1959 by Richard Feynman who shared his vision regarding nanotechnology in the talk "There is plenty of room at the bottom". Later, towards the end of the 20-th century, the interest in storage solutions based on DNA molecules was increased as a result of the human genome project which in turn led to a significant progress in sequencing and assembly methods. DNA storage enjoys major advantages over the well-established magnetic and optical storage solutions. As opposed to magnetic solutions, DNA storage does not require electrical supply to maintain data integrity and is superior to other storage solutions in both density and durability. Given the trends in cost decreases of DNA synthesis and sequencing, it is now acknowledged that within the next 10-15 years DNA storage may become a highly competitive archiving technology and probably later the main such technology. With that said, the current implementations of DNA based storage systems are very limited and are not fully optimized to address the unique pattern of errors which characterize the synthesis and sequencing processes. In this work, we propose a robust, efficient and scalable solution to implement DNA-based storage systems. Our method deploys Deep Neural Networks (DNN) which reconstruct a sequence of letters based on imperfect cluster of copies generated by the synthesis and sequencing processes. A tailor-made Error-Correcting Code (ECC) is utilized to combat patterns of errors which occur during this process. Since our reconstruction method is adapted to imperfect clusters, our method overcomes the time bottleneck of the noisy DNA copies clustering process by allowing the use of a rapid and scalable pseudo-clustering instead. Our architecture combines between convolutions and transformers blocks and is trained using synthetic data modelled after real data statistics.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)